Interfaces
Meta Glasses hands-on: Ray-Ban is out, Kylie Jenner is in
After years of releasing smart glasses that bore the Ray-Ban or Oakley brand, Meta has finally made its own (although still in collaboration with Essilor Luxxotica). The company today unveiled a trio of AI Glasses -- the Fury, the Adventurer and the Meta Glasses by Kylie (labeled in some places as Starfire), and the first two of those styles start at $299. The variant that was co-designed with celebrity Kylie Jenner, will cost $399. At its launch event in New York City yesterday, Meta set us up with a pair of the new glasses and a companion phone, and let us roam around the venue and its demo areas somewhat freely. The company also had multiple units of the other styles around for us to pick up and try on as we liked, so I got a good sense of all the different options available.
Meta's Very Own Smart Glasses Go on Sale Today for 299
The new Meta-branded glasses have the same camera, microphones, and chatbot as the Ray-Bans. They come in three styles, one of which was codesigned with Kylie Jenner. Smart glasses are like public transportation, according to Peter Bristol, Meta's vice president of industrial design. "People will use it when it's good enough." To reach "good enough," Meta is making its smart glasses more accessible, more customizable, and comfier to wear.
EgoVid-5M: ALarge-Scale Video-Action Dataset for Egocentric Video Generation
Video generation has emerged as a promising tool for world simulation, leveraging visual data to replicate real-world environments. Within this context, egocentric video generation, which centers on the human perspective, holds significant potential for enhancing applications in virtual reality, augmented reality, and gaming. However, the generation of egocentric videos presents substantial challenges due to the dynamic nature of egocentric viewpoints, the intricate diversity of actions, and the complex variety of scenes encountered. Existing datasets are inadequate for addressing these challenges effectively. To bridge this gap, we present EgoVid-5M, the first high-quality dataset specifically curated for egocentric video generation. EgoVid-5M encompasses 5 million egocentric video clips and is enriched with detailed action annotations, including 5M high-level textual descriptions and 65K fine-grained kinematic control annotations. To ensure the integrity and usability of the dataset, we implement a sophisticated data cleaning pipeline designed to maintain frame consistency, action coherence, and motion smoothness under egocentric conditions. Furthermore, we introduce EgoDreamer, which is capable of generating egocentric videos driven simultaneously by action descriptions and kinematic control signals. The EgoVid-5M dataset, associated action annotations, and all data cleansing metadata will be released for the advancement of research in egocentric video generation.
PhysioWave: AMulti-Scale Wavelet-Transformer for Physiological Signal Representation
Physiological signals are often corrupted by motion artifacts, baseline drift, and other low-SNR disturbances, which pose significant challenges for analysis. Additionally, these signals exhibit strong non-stationarity, with sharp peaks and abrupt changes that evolve continuously, making them difficult to represent using traditional time-domain or filtering methods. To address these issues, a novel waveletbased approach for physiological signal analysis is presented, aiming to capture multi-scale time-frequency features in various physiological signals. Leveraging this technique, two large-scale pretrained models specific to EMG and ECG are introduced for the first time, achieving superior performance and setting new baselines in downstream tasks. Additionally, a unified multi-modal framework is constructed by integrating pretrained EEG model, where each modality is guided through its dedicated branch and fused via learnable weighted fusion. This design effectively addresses challenges such as low signal-to-noise ratio, high inter-subject variability, and device mismatch, outperforming existing methods on multi-modal tasks. The proposed wavelet-based architecture lays a solid foundation for analysis of diverse physiological signals, while the multi-modal design points to nextgeneration physiological signal processing with potential impact on wearable health monitoring, clinical diagnostics, and broader biomedical applications.
LiteReality: Graphics-Ready 3DScene Reconstruction from RGB-DScans
We propose LiteReality, a novel pipeline that converts RGB-D scans of indoor environments into compact, realistic, and interactive 3D virtual replicas. LiteReality not only reconstructs scenes that visually resemble reality but also supports key features essential for graphics pipelines--such as object individuality, articulation, high-quality physically based rendering materials. At its core, LiteReality first performs scene understanding and parses the results into a coherent 3D layout and objects, with the help of a structured scene graph.
InfiniPot-V: Memory-Constrained KVCache Compression for Streaming Video Understanding
Modern multimodal large language models (MLLMs) can reason over hour-long video, yet their key-value (KV) cache grows linearly with time--quickly exceeding the fixed memory of phones, AR glasses, and edge robots. Prior compression schemes either assume the whole video and user query are available offline or must first build the full cache, so memory still scales with stream length. InfiniPot-V is the first training-free, query-agnostic framework that enforces a hard, lengthindependent memory cap for streaming video understanding. During video encoding it monitors the cache and, once a user-set threshold is reached, runs a lightweight compression pass that (i) removes temporally redundant tokens via Temporal-axis Redundancy (TaR) metric and (ii) keeps semantically significant tokens via Value-Norm (VaN) ranking. Across four open-source MLLMs and four long-video and streaming-video benchmarks, InfiniPot-V cuts peak GPU memory by up to 94%, sustains real-time generation, and matches or surpasses full-cache accuracy--even in multi-turn dialogues. By dissolving the KV cache bottleneck without retraining or query knowledge, InfiniPot-V closes the gap for on-device streaming video assistants.
From Pose to Muscle: Multimodal Learning for Piano Hand Muscle Electromyography
Muscle coordination is fundamental when humans interact with the world. Reliable estimation of hand muscle engagement can serve as a source of internal feedback, supporting the development of embodied intelligence and the acquisition of dexterous skills. However, contemporary electromyography (EMG) sensing techniques either require prohibitively expensive devices or are constrained to gross motor movements, which inherently involve large muscles.